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ABSTRACT 



Here we implemented a novel rendering method based on images converted from frames. Compared to other 
techniques, first method warps sprites with smooth surfaces which represent depth without gaps. Another method performs 
warping for more general scenes depending upon the halfway representation named LDI.LDI sight depends on single input 
camera view, but depends on multiple pixels in line of sight. Depends on depth complexity, size of portrayal changes. 
McMillan's warp ordering algorithm can be implemented because of single image coordinate system of LDI, resulting back 
to front order of pixels drawn in output image. Alpha compositing can be done effectively without depth sorting and no 
usage of z-buffer, so splitting becomes best solution for re-sampling problem. 

KEYWORDS: Plane Filtered, 3D Mapping, Depth Images, FSPF 

1. INTRODUCTION 

Applications like 3D mapping and reconstruction, shape analysis, pose tracking and object recognition can 
potentially benefit from this sensor modality. However, given that indoor mobile robots have limited onboard 
computational power it is infeasible to process the complete 3D point clouds in real time and at full frame rates 
(e.g. the Microsoft Kinect sensor produces 9.2M3D pts/sec). Feature extraction, and in particular, geometric feature 
extraction is therefore the natural choice for abstracting the sensor data. However, noisy sensing and the presence he sprite 
approximation's fidelity to the correct new view is highly dependent on the geometry being represented. In particular 
shown as orange points, corresponding convex polygons shown in blue. 

The complete 3D point cloud is overlaid as translucent grey for reference, (b) Scene polygon set generated by 
merging polygons from 15 consecutive depth image frames of geometric outliers (objects amidst the geometric features 
that do not match the geometric model of the features) provide additional challenges to the task of geometric feature 
extraction. We introduce the Fast Sampling Plane Filtering (FSPF) algorithm that samples the depth image to produce a 
set of "plane filtered" points corresponding to planes, the corresponding plane parameters (normals and offsets) and the 
convex polygons in 3D to fit these plane filtered points. 

The FSPF algorithm meets the following goals: 

• Reduce the volume of the 3D point cloud by generating a smaller set of "plane filtered" 3D points 

• Compute convex polygons to fit the plane filtered points 

• Iteratively merge convex plane polygons without maintaining a history of all observed plane filtered points 

• Perform all of the above in real time and at full frame rates 
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This paper introduces two new extensions to overcome both of these limitations. The first extension is primarily 
applicable to smoothly varying surfaces, while the second is useful primarily for very complex geometries. Each method 
provides efficient image based rendering capable of producing multiple frames per second on a PC. In the case of sprites 
representing smoothly varying surfaces, we introduce an algorithm for rendering Sprites with Depth. The algorithm first 
forward maps (i.e., warps) the depth values themselves and then uses this information to add parallax corrections to a 
standard sprite Tenderer. For more complex geometries, we introduce the Layered Depth Image, or LDI, that contains 
potentially multiple depth pixels at each discrete location in the image. Instead of a 2D array of depth pixels 
(a pixel with associated depth information), we store a 2D array of layered depth pixels. A layered depth pixel stores a set 
of depth pixels along one line of sight sorted in front to back order. The front element in the layered depth pixel samples 
the first surface seen along that line of sight; the next pixel in the layered depth pixel samples the next surface seen along 
that line of sight, etc. When rendering from an LDI, the requested view can move away from the original LDI view and 
expose surfaces that were not visible in the first layer. The previously occluded regions may still be rendered from data 
stored in some later layer of a layered depth pixel. 



Figure 1: Different Image Based Primitives Can Serve Well Depending on Distance from the Camera 

Representation grows linearly only with the depth complexity of the image. Moreover, because the LDI data are 
represented in a single image coordinate system, McMillan's ordering algorithm [20] can be successfully applied. As a 
result, pixels are drawn in the output image in back to front order allowing proper alpha blending without depth sorting. 
No z-buffer is required, so alpha-compositing can be done efficiently without explicit depth sorting. This makes splatting 
an efficient solution to the reconstruction problem. 

Sprites with Depth and Layered Depth Images provide us with two new image based primitives that can be used 
in combination with traditional ones. Figure 1 depicts five types of primitives we may wish to use. The camera at the center 
of the frustum indicates where the image based primitives were generated from. The viewing volume indicates the range 
one wishes to allow the camera to move while still re-using these image based primitives. The assumption here is that 
although the part of the scene depicted in the sprite may display some parallax relative to the background environment map 
and other sprites, it will not need to depict any parallax within the sprite itself. Yet closer to the camera, for ele-ments with 
smoothly varying depth, Sprites with Depth are capable of displaying internal parallax but cannot deal with disclusions. 

2. PREVIOUS WORK 

A convex polygon is denoted by the tuple c = f A P; n; p; r; bl; b2;Bg where A P is the set of 3D points used to 
construct the convex polygon, n the number of points in P, _p the centroid of the polygon, r the normal to the polygon 
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plane, bl and b2 the 2D basis vectors on the plane of the polygon and B the set of 3D points which define the convex 
boundary of the polygon. 



Algorithm 1 Plane Filtering Algorithm 



i procedure Plani-:Filtering<7} 

P,R,C,0 «- {} 
n,k «- 0 

while ti < n rnal a fc < k ma:l . do 
k *- k + 1 

d-o < — (rand(Q, h — l),rand(0. tir — 1)) 

d 1 < — d 0 + (rand(— 17, 77), rand( — 77, 77)) 

d.2 * — do + (rand( — jj, 17), rand( — T}, 17)) 

Reconstruct po, pi, P2 from do,f£i,d2 t> eq. 1-3 
r _ (pi - Pa fx ( Pi -Pa} 
\\(pi-pa}x(pz-pa)\ 
g _ Pn ^ +pi ^ +p->. 

ar" = ic4 tan(/ fc ) 
h' = iiftan(/„) 
numlnliers ■ — 0 
P 
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39: end procedure 



c> Compute plane normal 



{} 
" {} 
c«- {} 

for j" <— 3. £ do 

d, ^ + (rand(-£, Al).rand(-^, ^)) 
Reconstruct 3D point pj from dj e- eq. 1-3 
e = abs [r ■ (p — po)] 
if e < £ then 
Add pj to J 3 
Add t to R 

numlnliers «— numlnliers -1- 1 
end if 
end for 

if numlnliers > a. iT1 f then 
Add PtoP 
Add R to Ji 

Construct coo vex polygon c from ,P 

Add ctoC 

num Points < — num Points + numlnliers 

else 

Add P to O 
end if 
end while 
return P,R,C.O 



Figure 2 



3. RENDERING SPRITES 



Sprites are texture maps or images with alphas (transparent pixels) rendered onto planar surfaces. They can be 
used either for locally caching the results of slower rendering and then generating new views by warping [30, 26, 31, 14], 
or they can be used directly as drawing primitives (as in video games). The texture map associated with a sprite can be 
computed by simply choosing a 3D viewing matrix and projecting some portion of the scene onto the image plane. 
In practice, a view associated with the current or expected viewpoint is a good choice. A3Dplane equation can also be 
computed for the sprite, e.g., by fitting a 3D plane to the z-buffer values associated with the sprite pixels. Below, we derive 
the equations for the 2D perspective mapping between a sprite and its novel view. This is useful both for implementing a 
backward mapping algorithm, and lays the foundation for our Sprites with Depth rendering algorithm. A sprite consists of 
an alpha-matted image Il(xl, yl), a 4_4 camera matrix CI which maps from 3D world coordinates (X, Y, Z, 1) into the 
sprite's coordinates (xl, yl, zX, 1), 
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(zl is the z-buffer value), and a plane equation. This plane equation can either be specified in world coordinates, 
AX + BY + CZ +D = 0, or it can be specified in the sprite's coordinate system, axl + byl + czl + d = 0. In the former case, 
we can form a new camera matrix C 1 by replacing the third row of CI with the row [A B CD], while in the latter, we can 
compute'Cl = PCI, where 
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In either case, we can write the modified projection 

Where dl = 0 for pixels on the plane. For pixels off the plane, dl is the scaled perpendicular distance to the plane 
(the scale factor is 1 if A2 + B2 + CI = 1) divided by the pixel to camera distance w\. Given such a sprite, how do we 
compute the 2Dtransformation associated with a novel view "C2? The mapping between pixels 
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where HI, 2 is the 2D planar perspective transformation (homography) obtained by dropping the third row and 
column of 71,2. The coordinates (x2, y2) obtained after dividing out w2 index a pixel address in the output camera's image. 
Efficient backward mapping techniques exist for performing the 2D perspective warp [8, 34], or texture mapping hardware 
can be used. 

3.1 Sprites with Depth 

The descriptive power (realism) of sprites can be greatly enhanced by adding an out-of-plane displacement 
component d x at each pixel in the sprite. 1 Unfortunately, such a representation can no longer be rendered directly using a 
backward mapping algorithm. 

Using the same notation as before, we see that the transfer equation is now 
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A solution to this problem is to first forward map the displacements dl, and to then use Equation (4) to perform a 
backward mapping step with the new (view-based) displacements. We can therefore use a quick single-pixel splat 
algorithm followed by a quick hole filling, or alternatively, use a simple 2 _ 2 splat. The second main advantage is that we 
can design the forward warping step to have a simpler form by factoring out the planar perspective warp. Notice that we 
can rewrite Equation (4) as 
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where e 1,2 =//-! l,2el,2. 
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(g) 



(h) 



(i) 



(j) 



Figure 3: Plane with Bump Rendering Example: (a) Input Color (Sprite) Image /i (xi ,ji ); (b) Sprite Warped by 
Homography Only (No Parallax); (c) Sprite Warped by Homography and Crude Parallax (di ); (d) Sprite Warped 
by Homography and True Parallax (d 2 ); (e) With Gap Fill Width Set to 3; (f) Input Depth Map d x {x x ,y x ); (g) Pure 
Parallax Warped Depth Map d 3 (x 3 ,y 3 ); (h) Forward Warped Depth Map d 2 (x 2 ,y>2 ); (D Forward Warped Depth 
Map without Parallax Correction; (j) Sprite with "Pyramid" Depth Map 
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(b) 



(c) 



www.iaset.us 



editor @iaset.us 



38 



G. Pinakapani & T. Venkateswarlu 




(f) (g) (h) 

Figure 4: Results of Sprite Extraction from Image Sequence: (a) Third of Five Images; (b) Initial Segmentation Into 
Six Layers; (c) Recovered Depth Map; (d) The Five Layer Sprites; (e) Residual Depth Image for Fifth Layer; 
(c) Re-Synthesized Third Image (Note Extended Field of View); (g) Novel View without Residual Depth; 
(h) Novel View with Residual Depth (Note the "Rounding" of the People) 

Our novel two-step rendering algorithm thus proceeds in two stages: 

• Forward map the displacement map dl{xl, yl), using only the parallax component given in Equation to obtain 

d3(x3, v3); 

• Backward map the resulting warped displacements d3(x3, y3) using Equation (5) to obtain d2(x2, y2) 
(the displacements in the new camera view); 

• Backward map the original sprite colors, using both the homography 7/2,1 and the new parallax d2 as in 
Equation (4) 

(with the 1 and 2 indices interchanged), to obtain the image corresponding to camera C2. The last two operations 
can be combined into a single raster scan over the output image, avoiding the need to perspective warp d3 into d.2. 
More precisely, for each output pixel (x2, y2), we compute (x3, y3) such that 
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to compute where to look up the displacement d3(x3, y3), and form the final address of the source sprite pixel 

using 

2 We can obtain a quicker, but less accurate, algorithm by omitting the first step, i.e., the pure parallax warp from 
dl to d3. If we assume the depth 

3.2 Recovering Sprites from Image Sequences 

While sprites and sprites with depth can be generated using computer graphics techniques, they can also be 
extracted from image sequences using computer vision techniques. To do this, we use a layered motion estimation 
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algorithm [32, 1], which simultaneously segments the sequence into coherently moving regions, and computes a parametric 
motion estimate (planar perspective transformation) for each layer. To convert the recovered layers into sprites, we need to 
determine the plane equation associated with each region. 



While the use of sprites and Sprites with Depth provides a fast means to warp planar or smoothly varying 
surfaces, more general scenes require the ability to handle more general disocclusions and large amounts of parallax as the 
viewpoint moves. These needs have led to the development of Layered Depth Images (LDI).Like a sprite with depth, pixels 
contain depth values along with their colors (i.e., a depth pixel). In addition, a Layered Depth Image Figure 5 contains 
potentially multiple depth pixels per pixel location. The farther depth pixels, which are occluded from the LDI center, will 
act to fill in the disocclusions that occur as the viewpoint moves away from the center. The structure of an LDI is 
summarized by the following conceptual representation: 



In practice, we implement Layered Depth Images in two ways. When creating layered depth images, it is 
important to be able to efficiently insert and delete layered depth pixels, so the Layers array in the Layered Depth Pixel 
structure is implemented as a linked list. When rendering, it is important to maintain spatial locality of depth pixels in order 
to most effectively take advantage of the cache in the CPU [12]. In Section 5.1 we discuss the compact render-time version 
of layered depth images. There are a variety of ways to generate an LDI. Given a synthetic scene, we could use multiple 
images from nearby points of view for which depth information is available at each pixel. This informa-tion can be 
gathered from a standard ray tracer that returns depth per pixel or from a scan conversion and z -buffer algorithm where the 
z-buffer is also returned. 

Layered Depth Image = Camera: camera 

Pixels [0..xres-l,0..yres-l]: array of Layered Depth Pixel 

The layered depth image contains camera in formation plus an array of size x res by y res layered depth pixels. 

4.1 LDIs from Multiple Depth Images 

We can construct an LDI by warping n depth images into a com-mon camera view. For example the depth images 
C 2 and C 3 in Figure 5 can be warped to the camera frame defined by the LDI {C\ in figure 5). 3 If, during the warp from the 
input camera to the LDI camera, two or more pixels map to the same layered depth pixel, their Z values are compared. 
If the Z values differ by more than a preset epsilon, a new layer is added to that layered depth pixel for each distinct 
Z value (i.e., Num Layers is incremented and a new depth pixel is added), otherwise (e.g., depth pixels c and d in figure 5), 
the values are averaged resulting in a single depth pixel. This preprocessing is similar to the rendering described by 
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Figure 5: Layered Depth Image 
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Max [18]. This construction of the layered depth image is effectively decou-pled from the final rendering of images from 
desired viewpoints. Thus, the LDI construction does not need to run at multiple frames per second to allow interactive 
camera motion. 

4.2 LDIs from a Modified Ray Tracer 

By construction, a Layered Depth Image reconstructs images of a scene well from the center of projection of the 
LDI (we simply display the nearest depth pixels). The quality of the reconstruction from another viewpoint will depend on 
how closely the distribution of depth pixels in the LDI, when warped to the new viewpoint, corresponds to the pixel 
density in the new image. Two common events that occur are: (1) disocclusions as the viewpoint changes, When using a 
ray tracer, we have the freedom to sample the scene with any distribution of rays we desire. We could simply allow the 
rays emanating from the center of the LDI to pierce surfaces, recording each hit along the way (up to some maximum). 
This would solve the disocclusion problem but would not effectively sample surfaces edge on to the LDI. 

What set of rays should we trace to sample the scene, to best ap-proximate the distribution of rays from all 
possible viewpoints we are interested in? For simplicity, we have chosen to use a cubical region of empty space 
surrounding the LDI center to represent the region that the viewer is able to move in. Each face of the viewing cube defines 
a 90 degree frustum which we will use to define a single LDI (Figure 6). The six faces of the viewing cube thus cover all of 
space. For the following discussion we will refer to a single LDI. Each ray in free space has four coordinates, two for 
position and two for direction. Since all rays of interest intersect the cube faces, we will choose the outward intersection to 
parameterize the position of the ray. Direction is parameterized by two angles. 

4.3 LDIs from Real Images 

The dinosaur model in Figure 11 is constructed from 21 photographs of the object undergoing a 360 degree 
rotation on a computer-controlled calibrated turntable. An adaptation of Seitz and Dyer's voxel coloring algorithm [29] is 
used to obtain the LDI representation directly from the input images. The regular voxelization of Seitz and Dyer is 
replaced by a view-centered voxelization similar to the LDI structure. The procedure entails moving outward on rays from 
the LDI camera center and projecting candidate voxels back into the input images. If all input images agree on a color, this 
voxel is filled as a depth pixel in the LDI structure. This approach en-ables straightforward construction of LDI's from 
images that do not contain depth per pixel. 

5. RENDERING LAYERED DEPTH IMAGES 

Our fast warping-based Tenderer takes as input an LDI along with its associated camera information. Given a new 
desired camera position, the warper uses an incremental warping algorithm to efficiently create an output image. 
Pixels from the LDI are splatted into the output image using the over compositing operation. The size and footprint of the 
splat is based on an estimated size of the re-projected pixel. 

5.1 Space Efficient Representation 

When rendering, it is important to maintain the spatial locality of depth pixels to exploit the second level cache in 
the CPU [12]. To this end, we reorganize the depth pixels into a linear array ordered from bottom to top and left to right in 
screen space, and back to front along a ray. We also separate out the number of layers in each layered depth pixel from the 
depth pixels themselves. The layered depth pixel structure does not exist explicitly in this implementation. Instead, a 
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double array of offsets is used to locate each depth pixel. The number of depth pixels in each scan line is accumulated into 
a vector of offsets to the beginning of each scan line. Within each scan line, for each pixel location, a total count of the 
depth pixels from the beginning of the scan line to that location is maintained. Thus to find any layered depth pixel, one 
simply offsets to the beginning of the scan line and then further to the first depth pixel at that location. This supports 
scanning in right -to-left order as well as the clipping operation discussed later. 

5.2 Incremental Warping Computation 

The incremental warping computation is similar to the one used for certain texture mapping operations [9, 7]. 
The geometry of this computation has been analyzed by McMillan [22], and efficient computation for the special case of 
orthographic input images is given in [3]. Let CI be the 4 _ 4 matrix for the LDI camera. It is composed of an affine 
transformation matrix, a projection matrix, and a viewport matrix, CI = VI _P\ _A1. This camera matrix transforms a 
point from the global coordinate system into the camera's projected image coordinate system. The projected image 
coordinates (jtl, yl), obtained after multiplying the point's global coordinates by CI and dividing out wl, index a screen 
pixel address. The zlcoordinate can be used for depth comparisons in a z buffer. Let C2 be the output camera's matrix. 
Define the transfer matrix as 71 ,2 = C2 _ C— 1 



1. Given the projected image coordinates of some point seen in the LDI camera (e.g., the coordinates of a in 



Figure 5), 
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The coordinates (x2, y2) obtained after dividing by w2, index a pixel address in the output camera's image. 
Using the linearity of matrix operations, this matrix multiply can be factored to reuse much of the computation from each 
iteration through the layers of a layered depth pixel; result can be computed 
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To compute the warped position of the next layered depth pixel along a scanline, the new start is simply 
incremented 




c. 

Figure 6: Values for Size Computation of a Projected Pixel 
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- start + xiucr 

The warping algorithm proceeds using McMillan's ordering algorithm [20]. The LDI is broken up into four 
regions above and below and to the left and right of the epipolar point. For each quadrant, the LDI is traversed in 
(possibly reverse) scan line order. At the beginning of each scan line, start is computed. The sign of xincris determined by 
the direction of processing in this quadrant. Each layered depth pixel in the scan line is then warped to the output image by 
calling Warp. This procedure visits each of the layers in back to front order and computes result to determine its location 
in the output image. As in perspective texture mapping, a divide is required per pixel. Finally, the depth pixel's color is 
splatted at this location in the output image 

The following pseudo code summarizes the warping algorithm applied to each layered depth pixel. 

procedure Warp(ldpix. start, depth, xincr) 
for k ^0 to dpis NiimLayers- 1 
zl *— ldpix. Layers [k] .Z 
result f-start + zl * depth 
//cull if the depth pixel goes behind the output camera 
//or if the depth pixel goes out of the output cam's frustum 
if result, w > 0 and IsniVie\vport(result) then 
result <- result / result, w 
// see next section 

sqrtSize <-z2 * lwkupTable[ldpix.Layers[k],SplatIndex] 
splat(ldpix,Layei's[k].ColorRGBA. x2. y2. sqrtSize) 
end if 

// increment for next layered pixel on this scan line 
starts start + liner 
end for 
end procedure 

5.3 Splat Size Computation 

To splat the LDI into the output image, we estimate the projected area of the warped pixel. This is a rough 
approximation to the foot print evaluation [33] optimized for speed. The proper size can be computed (differentially) as 

(d\ ) cos(@2 ) resi tan [fov\ 1 2) 

Sl'Zt? = — — 

{dj ) 2 cos(0i ) res i tan (few?. / 2 ) 

where dl is the distance from the sampled surface point to the LDI camera, fovl is the field of view of the LDI 
camera, resl = (w\h\)—\ where wl and hi are the width and height of the LDI, and _1 is the angle between the surface 
normal and the line of sight to the LDI camera (see Figure 6). The same terms with subscript 2 refer to the output camera. 

It will be more efficient to compute an approximation of the square root of size, 
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^2 



d\ -y/ cos(02)res2 tanifov\ / 2) 



cos{B\)res\ tanifovTj-) 



1 



d\ yf cos(tp2 )}'es2 ta ti (fov if 2) 



Z 2 



■y/ cos(<p\ )msi tan(fo\'2/2) 



yf cos{<p\)res\ tan{fov} /2) 



We approximate the _s as the angles _ between the surface normal vector and the z axes of the camera's 
coordinate systems. We also approximate dl by Z2, the z coordinate of the sampled point in the output camera's 
unprojected eye coordinate system. During rendering, we set the projection matrix such that z2 = 1=Z2. The current 
implementation supports 4 different splat sizes, so a very crude approximation of the size computation is implemented 
using a lookup table. For each pixel in the LDI, we store dl using 5 bits. We use 6 bits to encode the normal, 3 for nx, 
and 3 for ny. This gives us an eleven-bit lookup table index. Before rendering each new image, we use the new output 
camera information to precompute values for the 2048 possible lookup table indexes. At each pixel we obtain psize by 
multiplying the computed z2 by the value found in the lookup table. 



To maintain the accuracy of the approximation for dl, we discretize dl nonlinearly using a simple exponential 
function that allocates more bits to the nearby dl values, and fewer bits to the distant dl values. The four splat sizes we 
currently use have 1 by 1, 3 by 3, 5 by 5, and 7 by 7 pixel footprints. Each pixel in a footprint has an alpha value to 
approximate a Gaussian splat kernel. However, the alpha values are rounded to 1, 1/2, or 1/4, so the alpha blending can be 
done with integer shifts and adds 

5.4 Depth Pixel Representation 

The size of a cache line on current Intel processors (Pentium Pro and Pentium II) is 32 bytes. To fit four depth 
pixels into a single cache line we convert the floating point Z value to a 20 bit integer. This is then packed into a single 
word along with the 11 bit splat table index. These 32 bits along with the R, G, B, and alpha values fill out the 8 bytes. 
This seemingly small optimization yielded a 25 percent improvement in rendering speed. 



\fsize » 25 ■ lookirp[nx. ny. dl] 



Far Segment 




Figure 7: LDI with Two Sements 
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5.5 Clipping 

The LDI of the chestnut tree scene in Figure 1 1 is a large data set containing over 1 . 1 million depth pixels. If we 
naively render this LDI by reprojecting every depth pixel, we would only be able to render at one or two frames per 
second. When the viewer is close to the tree, there is no need to flow those pixels that will fall outside of the new view. 
Unseen pixels can be culled by intersecting the view frustum with the frustum of the LDI. This is implemented by 
intersecting the view frustum with the near and far plane of the LDI frustum, and taking the bounding box of the 
intersection. This region defines the rays of depth pixels that could be seen in the new view. This computation is 
conservative, and gives suboptimal results when the viewer is looking at the LDI from the side (see Figure 7). The view 
frustum intersects almost the entire cross section of the LDI frustum, but only those depth pixels in the desired view need 
be warped. Our simple clipping test indicates that most of the LDI needs to be warped. To alleviate this, we split the LDI 
into two segments, a near and a far segment (see Figure 7). These are simply two frustra stacked one on top of the other. 
The near frustum is kept smaller than the back segment. We clip each segment individually, and render the back segment 
first and the front segment second. Clipping can speed rendering times by a factor of 2 to 4. 

6. RESULTS 

While the LDIs are allocated with a maximum of 10 layers per pixel, the average depth complexity for these LDIs 
is only 1.24. Thus the use of three input images only increases the rendering cost by 24 percent. Tenderer 
(running concurrently in a high-priority thread) generates images at 300 by 300 resolution. On a Pentium II PC running at 
300MHz, we achieved frame rate of 6 to 10 frames per second. Figures 9 and 10 show two cross-eye stereo pairs of a 
chestnut tree. In Figure 9 only the near segment is displayed. Figure 9 shows both segments in front of an environment 
map. The LDIs were created using a modified version of the Rayshade raytracer. The tree model is very large; Rayshade 
allocates over 340 MB of memory to render a single image of the tree. The stochastic method discussed in Section 4.2 
took 7 hours to trace 16 million rays through this scene using an SGI Indigo2 with a 250 MHz processor and 320MB of 
memory. The resulting LDI has over 1.1 million depth pixels, 70,000 of which were placed in the near segment with the 
rest in the far segment. When rendering this interactively we attain frame rates between 4 and 10 frames per second on a 
Pentium II PC running at 300MHz. 




Figure 8: Barnyard Scene 
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Figure 9: Near Segment of Chestnut 




Figure 10: Chestnut Tree in Front of Environment 




Figure 11: Dinosaur Model Reconstructed from 21 Photographs 

7. DISCUSSIONS 

In this paper, we have described two novel techniques for image based rendering. The first technique renders 
Sprites with Depth without visible gaps, and with a smoother rendering than traditional forward mapping (splatting) 
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techniques. It is based on the observa-tion that a forward mapped displacement map does not have to be as accurate as a 
forward mapped color image. If the displacement map is smooth, the inaccuracies in the warped displacement map result in 
only sub-pixel errors in the final color pixel sample positions. 

Our second novel approach to image based rendering is a Layered Depth Image representation. The LDI 
representation provides the means to display the parallax induced by camera motion as well as reveal disoccluded regions. 
The average depth complexity in our LDI's is much lower that one would achieve using multiple input images 
(e.g., only 1.24 in the Chicken LDI). The LDI representation takes advantage of McMillan's ordering algorithm allowing 
pixels to be splatted back to Front with an over compositing operation. Traditional graphics elements and planar sprites can 
be combined with Sprites with Depth and LDIs in the same scene if a back-to-front ordering is maintained. In this case 
they are simply composited onto one another. Without such an ordering a z-buffer approach will still work at the extra cost 
of maintaining depth information per frame. 

Choosing a single camera view to organize the data has the advan-tage of having sampled the geometry with a 
preference for views very near the center of the LDI. This also has its disadvantages. First, pixels undergo two resampling 
steps in their journey from in-put image to output. This can potentially degrade image quality. Secondly, if some surface is 
seen at a glancing angle in the LDIs view the depth complexity for that LDI increases, while the spatial sampling resolution 
over that surface degrades. The sampling and aliasing issues involved in our layered depth image approach are still not 
fully understood; a formal analysis of these issues would be helpful. 

With the introduction of our two new representations and rendering techniques, there now exists a wide range of 
different image based rendering methods available. At one end of the spectrum are tradi-tional texture-mapped models. 
When the scene does not have too much geometric detail, and when texture-mapping hardware is avail-able, this may be 
the method of choice. If the scene can easily be partitioned into non-overlapping sprites (with depth), then triangle -based 
texture-mapped rendering can be used without requiring a z buffer [17, 4]. 

All of these representations, however, do not explicitly account for certain variation of scene appearance with 
viewpoint, e.g., specu-larities, transparency, etc. View-dependent texture maps [5], and 4D representations such as 
lightfields or Lumigraphs [15, 7], have been designed to model such effects. These techniques can lead to greater realism 
than static texture maps, sprites, or Layered Depth Images, but usually require more effort (and time) to render. 

In future work, we hope to explore representations and rendering al-gorithms which combine several image based 
rendering techniques. Automatic techniques for taking a 3D scene (either synthesized or real) and re -representing it in the 
most appropriate fashion for im-age based rendering would be very useful. These would allow us to apply image based 
rendering to truly complex, visually rich scenes, and thereby extend their range of applicability. 
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